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1. INTRODUCTION 

Stroke is a leading cause of mortality and disability throughout the world [1, 2]. This far, ischemic 
stroke is the most common type, which accounts for 70-90% of all stroke cases [3, 4]. Deaths that occur due to 
ischemic stroke are still of foremost concern [5]. This disease becomes an important global health problem, 
so that an effective way is needed to reduce mortality from this ischemic stroke. One way to diagnose whether 
a patient has cerebral infarction, an examination from the radiology agency is needed, and one diagnostic 
method often used to conduct these examinations is the computed tomography scanning (CT Scan). This 
method is used to obtain a picture of the patient's head area. When some firmly demarcated dark areas are 
visualized surrounding the brain tissue during the test, then that area is the chronic phase. As a result, a body 
function regulated by the area tends to be permanently disrupted when early treatment isn’t provided. 

Early medication helps to prevent diseases. Therefore, one important method used to prevent 
chronic cerebral infarction is early identification to enable the patient to obtain the right treatment and care 
immediately. One method used for this classification is machine learning such as the multiple support vector 
machines with information gain feature selection (MSVM-IG) as proposed in this study. The cerebral 
infarction data was obtained from RSCM hospital with as many as 206 patients who had undergone 
the examination. Each patient was informed of the feature used to determine the severity of cerebral 
infarction, and its data in this study consists of 10 features. 
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The previous researches on the classification of cerebral infarction had been carried out using 
the Support Vector Machine method [6, 7] with great results. Similarly, the information gain feature selection 
method has been used to detect Brain [8] and Lung Cancer [9]. In addition, the support vector machine 
method has been used for the classification of schizophrenia data [10], to construct process maps for additive 
manufacturing [11], often used for pattern recognition one of them in [12], for prediction of protein structural 
classes [13], hyperspectral imagery [14], traffic incident detection [15], for image retrieval and image 
process [16], fault interpretation, a study based on 3D seismic mapping of the Zhaozhuang coal mine in 
the Qinshui Basin, China [17], intrusion detection system [18], pattern recognition to AVO classification [19], 
for estimation of reservoir porosity and water saturation based on seismic attributes [20], elastic impedance 
based facies classification [21], and the application of svm for prediction of coal and gas outburst [22]. 


2. RESEARCH METHOD 

This research proposes a Multiple Support Vector Machine with Information Gain Feature Selection 
(MSVM-IG) for early cerebral infarction classification. MSVM-IG is a method that uses support vector 
obtained from SVM as an input in feature selection. Therefore, the amount of data processed by the IG 
feature selection is not the same as the initial. The term multiple is used because after the feature selection 
process with IG, SVM is re-evaluated. Due to the decrease in the amount of input data, IG selection features 
are able to rank features more accurately with SVM producing better accuracy. 


2.1. Data 

The numeric data used in this study obtained from the results of the CT scan of Cipto 
Mangunkusumo Hospital, Central Jakarta, which consists of 10 features, and they include: Gender, Age 
of patient, Cerebral infarction area, Air normal cavity, Minimum value of area, Maximum value of area, Sum 
of acute point, Length of area, Average of area, and Standar deviasi of area. The data include 206 
observations with are 103 data labeled positive infarc and 103 data negative infarc. 


2.2. Information gain feature selection 

Information gain (IG) is one technique of filter type selection which works by sorting features based 
on each value. Measurements from IG itself are based on the basic concept of entropy by determining 
the difference between the entropy of all training data and the weighted sum of its subset of partition values 
on a feature [23]. IG is also one of the easiest and fastest methods of sorting features. For example, 
there is a training data set S with n-features and m-classes, S(A,,A3>,...,4n,C) with C is an attribute 
consisting of different m-classes. The value for the entropy of all training data is calculated based on 
different m-classes, therefore: 


Entropy(S) = — Wiki p(ci) * log2p(ci) (1) 


with p(c;) is the probability (relative frequency) of the class (c;) in the S training data, with l different 
values used to calculate the weighed total sum of the entropy subset or partitioned values. Each value 
contains an entropy value based on the class label in feature A such that call S, acts as a subset, where 
v = 1,2,...,1. Therefore, the weighted sum of the entropy subset of partition values on a feature is formulated 
as follows: 


Sv 
Deane » entropy(S,) (2) 
As previously explained, IG is obtained by looking at the difference between the entropy of all 


training data and the weighted sum of the entropy subset of the partition values on a feature. Therefore, 
the difference from equations (1) and (2) is the IG of a feature [23]: 


IG(S,A) = Entropy(s) — La * entropy(S,) (3) 
= — Liki p(ci) - logap(c;) — Lease" * entropy (Sy) (4) 


2.3. Support vector machine 

Support vector machine (SVM) which was introduced by Vapnik in the late 1990s, is a machine 
learning algorithm used for classification and regression. SVM is related to structural risk minimization 
(SRM) and was initially used for binary classification. It is currently used for multiclass classification 
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and takes the form of mapping input space into higher dimensional space to support nonlinear classification 
problem where the maximum separation of the hyperplane is constructed. The hyperplane is a linear pattern 
whose maximum margin provides separation between decision classes. 

In the dataset {x;, y;}{_,, N is the number of samples, x; € R? is a feature vectors from sample-i, 
with D is the number of features (dimension), and y; is a class label. For the two-class classification problem 
yi E {—1, +1}, while in a multiclass y; € {1,2, ...,} with k is the number of class. The main goal of SVM 
is to determine the best hyperplane [24] and it illustrated in Figure 1: 


w:x+b=0 (5) 


Class U 





Figure 1. SVM is trying to determine the best hyperplane to separate two classes 


The problem of SVM optimization is summarized as follows: 
woe 2 
min; || w]| (6) 


s.t. yi(w7 < xi + b) 2 1,Yi = 1, ..., N (7) 


Objective function (6) to determine w € R” and b € R” subject to (7), with w is the weights 
and b is bias. By completing the equation above, the formula w and b are obtained as follows: 


w= D1 UVixi (8) 
b = Z- Fiesi — Emes amYmXm) O) 
and, the decision function as follows: 
f(x) = sign(w:x +b) (10) 
Below is the diagram flow of the proposed method, see Figure 2. First step is the data will be 


processed by SVM so that the support vector is generated. Then, the IG feature selection will select 
the selected features based on support vector. Lastly, SVM will be used again to get the measurement. 





*SVM evaluated 
the classification 





“IG selected the 
features 


“SVM produced 
support vectors 





Figure 2. The flow diagram of MSVM-IG 
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2.4. Kernel function 
This research utilizes two kernel functions, namely radial basis function and polynomial kernel 
functions with several parameters. The kernel function is given as follows: 


K(x; x) = (p(x), p(x;)) (11) 


with (x) is a function that maps x € R” to the feature space F. Every time (p(x;),p(%;)) appears 
in the classification algorithm, it is replaced with K (xx) [25]. By using kernel functions, it is expected 
that data is linearly separated linearly on higher dimensions. The formula of radial basis function (RBF) 


and polynomial are shown below [26]. 
- RBF Kernel Function: 


k(x, y) = exp(—yllx — yll’) (12) 
- Polynomial kernel function: 

k(x,y) = [(x-y) + 1]? (13) 
2.5. Model performance evaluation 

In this study, a performance evaluation model was conducted by measuring accuracy, precision, 


sensitivity, specificity, and recall. Let TN, TP, FN, FP denote true negative, true positive, false negative, 
and false positive, respectively. The following formulas below are used [27]: 





TP+TN 
Accuracy = ————— (14) 
TP+TN+FN+FP 
-a TP 
Precision = —— (15) 
TP+FP 
ahi te TP 
Sensitivity = TET (16) 
PT E TN 
Specificity = TAT (17) 
F1—Score = 2 - ecsion*Recall (18) 
Precision+Recall 


3. RESULTS AND ANALYSIS 

The support vector machine with information gain (IG-SVM) feature selection conventional 
(without multiple SVM) is used to compare the proposed method. Two kernel functions were used, namely 
radial basis function (RBF) and polynomial. Approximately 10 values of o and d are used in 
the RBF and polynomial kernels respectively with the same parameter values; C = 1000, k-fold = 3, 
and 5 main features. 


3.1. Classification results with RBF kernel 


For the RBF kernel we tried 10 different o values that we determined randomly. The results are listed 
in Tables 1 and 2. 


Table 1. Results of cerebral infarction classification using MSVM-IG with RBF kernel 








o Accuracy (%) Precision (%) Sensitivity (%) Specificity (%) Fl-score (%) 
0.0001 81.863 81.553 83.333 81.372 81.951 
0.001 81.127 79.812 83.333 78.922 81.535 
0.05 80.882 79.439 83.333 78.431 81.34 
0.1 80.76 79.254 83.333 78.186 81.243 
1 80.686 79.143 83.333 78.039 81.184 
10 80.637 79.07 83.333 77.941 81.146 
50 80.602 79.017 83.333 77.871 81.118 
100 80.576 78.978 83.333 77.819 81.097 
1000 80.556 78.947 83.333 77.7118 81.081 
10000 80.539 78.923 82.353 771.145 81.068 
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Table 2. Results of cerebral infarction classification using IG-SVM with RBF kernel 








o Accuracy (%) Precision (%) Sensitivity (%) Specificity (%) Fl-score (%) 
0.0001 80.763 80.490 82.493 80.872 80.661 
0.001 80.137 79.612 82.343 79.722 80.524 
0.05 80.782 79.339 81.433 79.431 80.245 
0.1 79.751 78.154 81.433 79.187 80.143 
1 79.586 78.133 81.333 79.029 80.104 
10 79.507 78.071 81.333 78.831 79.144 
50 78.492 78.017 81.333 78.771 79.137 
100 78.466 77.968 80.443 78.519 79.035 
1000 78.446 77.847 80.443 77.769 79.033 
10000. 78.429 77.623 80.443 77.750 79.021 





According to Tables 1 and 2, the smaller the value of othe greater the classification results with 
the highest accuracy, precision, sensitivity, specificity, and fl-score values obtained when the value of 
o = 0.0001 for both methods. This is because the smaller the value of othe faster the classification method 
to learn data patterns and produce better results. The MSVM-IG produces better results than IG-SVM with 
the highest accuracy, precision, sensitivity, specificity, and fl-score obtained by 81.863%, 81.553%, 
83.333%, 81.372%, and 81.951% respectively. There was an approximate total difference of 1% between 
the two methods, however, MSVM-IG is the method of choice for the classification of cerebral infarction. 


3.2. Classification results with polynomial kernel 

Also, for the polynomial kernel we tried 10 different d values that we determined randomly. 
The results are listed in Tables 3 and 4. The result shows that for experiment d values from 1 to 10 produced 
the same accuracy, precision, sensitivity, specificity, and Fl-score. 


Table 3. Results of cerebral infarction classification using MSVM-IG with polynomial kernel 








d Accuracy (%) Precision (%) Sensitivity (%) Specificity (%) Fl-score (%) 
1 80.392 78.704 83.333 77.451 80.952 
2 80.392 78.704 83.333 77.451 80.952 
3 80.392 78.704 83.333 77.451 80.952 
4 80.392 78.704 83.333 77.451 80.952 
5 80.392 78.704 83.333 77.451 80.952 
6 80.392 78.704 83.333 77.451 80.952 
7 80.392 78.704 83.333 77451 80.952 
8 80.392 78.704 83.333 77.451 80.952 
9 80.392 78.704 83.333 77.451 80.952 
10 80.392 78.704 83.333 77.451 80.952 





Table 4. Results of cerebral infarction classification using IG-SVM with polynomial kernel 








d Accuracy (%) Precision (%) Sensitivity (%) Specificity (%) Fl-score (%) 
1 79.882 78.145 82.954 77.211 79.534 
2 79.792 78.145 82.833 77.211 79.534 
3 79.592 78.144 82.573 77.211 79.534 
4 78.456 77.7605 82.573 76.352 78.726 
5 78.455 77.7605 82.573 76.352 78.726 
6 78.444 77.7605 82.573 76.352 78.726 
7 78.340 77.7605 82.573 76.352 78.726 
8 78.340 77.7605 81.997 76.352 77.942 
9 78.340 77.7605 81.997 75.451 77.942 
10 78.340 77.7605 81.997 75.441 77.942 





According to Tables 3 and 4, the smaller the value of d the greater the classification results, 
the higher the accuracy, precision, sensitivity, and specificity, with fl-score values are obtained when d = 1 
for both methods. The smaller the value of d the faster the classification method to quickly learn data 
patterns and produce better results. The results of MSVM-IG is better than IG-SVM with the highest 
accuracy, precision, sensitivity, specificity, and fl-score obtained by 80.392%, 78.704%, 83.333%, 77.451%, 
and 80.952% respectively. The difference between the two methods is approximately 1%, however, 
the MSVM-IG tends to be the method of choice for the classification of cerebral infarction. 
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4. CONCLUSION 

Stroke holds the second place of leading cause of death and the third the leading cause of disability. 
Ischemic stroke is the most common type so we have to find the way to label stoke efficiently. This study 
proposed a multiple support vector machine using the information gain feature selection (MSVM-IG) for 
the classification of cerebral infarction. Additionally, the RBF and polynomial kernel functions are used 
and based on the results as well as discussion, it was found that MSVM-IG tends to produce good accuracy, 
sensitivity, specificity, and Fl-score when using the RBF kernel (0 = 0.0001) with a high enough accuracy 
of 81.863%. When compared with the conventional method, namely support vector machine with 
information gain feature selection (IG-SVM), the difference was approximately 1% with MSVM-IG results 
greater than IG-SVM. This indicated that MSVM-IG has a better result than the conventional method. 
For future work, this modification could be improved again and the other kernel functions and techniques can 
be used for comparison. 
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